The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming
نویسنده
چکیده
In adaptive dynamic programming, neurocontrol and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimise a total cost function. In this paper we show that when discretized time is used to model the motion of the agent, it can be very important to do “clipping” on the motion of the agent in the final time step of the trajectory. By clipping we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum; and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms which use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include Backpropagation Through Time for Control, and methods based on Dual Heuristic Dynamic Programming. However the clipping problem does not significantly affect methods based on Heuristic Dynamic Programming, Temporal Differences or Policy Gradient Learning algorithms. Similarly, the clipping problem does not affect fixedlength finite-horizon problems.
منابع مشابه
Adaptive fuzzy sliding mode and indirect radial-basis-function neural network controller for trajectory tracking control of a car-like robot
The ever-growing use of various vehicles for transportation, on the one hand, and the statistics ofsoaring road accidents resulting from human error, on the other hand, reminds us of the necessity toconduct more extensive research on the design, manufacturing and control of driver-less intelligentvehicles. For the automatic control of an autonomous vehicle, we need its dynamic...
متن کاملDesigning stable neural identifier based on Lyapunov method
The stability of learning rate in neural network identifiers and controllers is one of the challenging issues which attracts great interest from researchers of neural networks. This paper suggests adaptive gradient descent algorithm with stable learning laws for modified dynamic neural network (MDNN) and studies the stability of this algorithm. Also, stable learning algorithm for parameters of ...
متن کاملAdaptive aggregate production planning with fuzzy goal programming approach
Aggregate production planning (APP) determines the optimal production plan for the medium term planning horizon. The purpose of the APP is effective utilization of existing capacities through facing the fluctuations in demand. Recently, fuzzy approaches have been applied for APP focusing on vague nature of cost parameters. Considering the importance of coping with customer demand in different p...
متن کاملPerformance Analysis of Dynamic and Static Facility Layouts in a Stochastic Environment
In this paper, to cope with the stochastic dynamic (or multi-period) problem, two new quadratic assignment-based mathematical models corresponding to the dynamic and static approaches are developed. The product demands are presumed to be dependent uncertain variables with normal distribution having known expectation, variance, and covariance that change from one period to the next one, randomly...
متن کاملDecision making in forest management with consideration of stochastic prices
The optimal harvesting policy is calculated as a function of the entering stock, the price state, the harvesting cost, and the rate of interest in the capital market. In order to determine the optimal harvest schedule, the growth function and stumpage price process are estimated for the Swedish mixed species forests. The stumpage price is assumed to follow a stochastic Markov process. A stoch...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1302.5565 شماره
صفحات -
تاریخ انتشار 2013